New function `offline_run!` to write data during `run!` #815

mastrof · 2023-06-03T14:40:50Z

Replaces #630

codecov-commenter · 2023-06-03T14:48:04Z

Codecov Report

Merging #815 (28c75b0) into main (210e3ac) will increase coverage by 0.26%.
The diff coverage is 80.28%.

@@            Coverage Diff             @@
##             main     #815      +/-   ##
==========================================
+ Coverage   69.76%   70.02%   +0.26%     
==========================================
  Files          41       42       +1     
  Lines        2636     2706      +70     
==========================================
+ Hits         1839     1895      +56     
- Misses        797      811      +14

Impacted Files	Coverage Δ
...AgentsOSMVisualizations/AgentsOSMVisualizations.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/abmplot.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/convenience.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/daisyworld_def.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/inspection.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/interaction.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/lifting.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/model_observable.jl	`0.00% <ø> (ø)`
ext/AgentsVisualizations/src/utils.jl	`0.00% <ø> (ø)`
src/submodules/io/AgentsIO.jl	`100.00% <ø> (ø)`
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Datseris

Is CSV the best thing to use here? Aren't there any other binary-based data formats that can be used instead, and loaded directly into a dataframe...?

Project.toml

src/simulations/collect.jl

mastrof · 2023-06-04T15:32:44Z

Is CSV the best thing to use here? Aren't there any other binary-based data formats that can be used instead, and loaded directly into a dataframe...?

Is the problem with CSV "only" that it is not binary, or something else?
At the moment I don't know an easy way to append to file with e.g. JLD2. I believe you would have to reload the dataframe to modify it in place and save it again, which would make the whole thing useless.. Maybe there is a way, but I don't know it. I will look into some of the larger simulation packages (e.g. Oceananigans comes to mind), maybe someone already found a solution to this.

Datseris · 2023-06-04T16:41:39Z

I am surprised that DataFrames.jl does not provide a binary file format that can also be read and written to line-by-line. Seems like the data science people would have figured this out by now.

Datseris · 2023-06-04T16:42:43Z

Yes the problem with CSV is "only" that it is non-binary, but this is a rather big problem. If you are using this functionality, you probably have a lot of data, and to my knowledge text-based ways to save numerical data are the most inefficient in terms of memory. So the disk space you will fill with this will be very large.

mastrof · 2023-06-10T11:40:57Z

Haven't taken time to look for an alternate solution to CSV yet.
But if we agree on the current logic then it will be just a matter of substituting CSV.write with a different write function once an appropriate candidate is found.

src/simulations/collect.jl

Datseris

We fully agree on the logic. DOn't forget to:

increment minor version in project.toml
add changelog entry udner new minor version
add the new function in the documentation (API page). I guess it should go under the Save. Load, Checpoints section
Mention the function in run! by saying "see also [offline_run!](@ref)

It would be great if we could have a more disk-efficient backend to save. But CSV files are nice as well and some users may prefer them for other reasons. I propose to al;ready anticipate the possibility of different backends and provide a keyword argument that configures what backend should be used.

mastrof · 2023-06-11T15:47:10Z

I propose to al;ready anticipate the possibility of different backends and provide a keyword argument that configures what backend should be used.

Would something like this work, at the beginning of offline_run!? Other backends could then be included by adding some elseifs, but I have a feeling this might be very bad performance-wise.

write_to_file(filename, data, append) = if lowercase(backend) == "csv"
    AgentsIO.CSV.write(filename, data; append=append)
else
    throw(ArgumentError("Backend $(backend) not supported."))
end

Would this be improved by defining a "maker" function?

function make_writer(backend)
    s = lowercase(backend)
    if s == "csv"
        return (filename, data, append) -> AgentsIO.CSV.write(filename, data; append=append)
    elseif s == "mybackend"
        return (filename, data, append) -> MyBackend.write(filename, data, append)
    else 
        throw(ArgumentError("Backend $(backend) not supported."))
    end
end

and then inside of offline_run! have a line write_to_file = make_writer(backend)?

Datseris · 2023-06-11T17:13:56Z

creating a maker function is indeed a better idea. However you should also introduce a function barrier: after you initialize all data frames and the maker function, pass all of these into another function. This establishes type stability where it matters.

Datseris

This can't be merged unless conflicts are resolved (rebase wiht main). Please don't forget teh function barrier I am sure it will have a positive impact on performance.

src/simulations/collect.jl

…o offline_run

fbanning · 2023-06-21T11:21:38Z

Arrow backend is implemented now. In the process I've condensed the functions a bit. Also added tests for Arrow functionality and they passed locally on my machine.

The workflow for adding a new backend would now be:

Add writer_$backend function taking filename, data, append as inputs.
Add backend to get_writer function.
Done (hopefully). :)

fbanning · 2023-06-21T12:05:09Z

https://github.com/JuliaDynamics/Agents.jl/actions/runs/5333264438/jobs/9663511968?pr=815#step:7:357

Can anyone test this on Windows? I don't have a Windows machine.

src/simulations/collect.jl

Project.toml

fbanning

Sorry, this change doesn't make any sense. When the file doesn't exist or when we want to overwrite an existing file (i.e. !append), then we need to pass file = false to the write function. Please revert it.

Tortar · 2023-07-10T11:58:49Z

yes, you are right, it was a last, really desperate, attempt to solve the problem, don't know why but my Windows didn't complain locally for this :D, anyway I tried to dig up a bit for sometime and I still have no clue why Windows denies the possibility to remove the file

fbanning · 2023-07-10T12:05:57Z

@Tortar Did you run the tests locally on your windows machine and they all passed? If so, then we can be sure that this is a CI windows problem.

Tortar · 2023-07-10T12:10:19Z

no I didn't explain myself well, they passed when I changed the lines with the non sensical approach :D (but this is probably because I also changed something else), they don't pass locally on my Windows machine, the file mdata.arrow can't be removed, but I didn't find any fix

fbanning · 2023-07-10T12:14:13Z

Ah sorry, misunderstood you then. Hm, really weird that it only fails on Windows but has no problems on Linux and MacOS. No idea how to approach this myself, as well, sorry. :/

docs/src/api.md

Datseris · 2023-07-10T13:18:30Z

Sorry for being away but I am overwhelmed with other projects and won't have much time for Agents.jl for the next semester or so... I'll be giving comments if you explicitly as for them by tagging me @Tortar or @fbanning but will have notifications off for the next semester :(

Now, to finish this: if I understand correclty the status quo is that this works fine, but doesn't work on windows, yes? If so, can we then add an error to the first line of the function that checks the OS and prints and error, and open an issue that this function doesn't seem to work as expect in windows with Arrow? In the test suite we only run the test if OS is not windows. Then we can merge? Because everything else seems fine.

fbanning · 2023-07-10T13:30:46Z

If so, can we then add an error to the first line of the function that checks the OS and prints and error, and open an issue that this function doesn't seem to work as expect in windows with Arrow? In the test suite we only run the test if OS is not windows.

Sure thing. I'll add some clauses for that.

Then we can merge? Because everything else seems fine.

I'll do that right after pushing the required OS handling changes and after the tests finally pass.

kavir1698 and others added 7 commits June 8, 2022 11:24

Allow writing data during run!

1f2241d

fix conflicts

eef1e0c

define new offline_run! function

aa46620

fix tests

893aaa8

cleanup run!

a1a9361

cleanup run! (again)

1267d2d

fix typo

13fb707

Datseris reviewed Jun 4, 2023

View reviewed changes

Project.toml Outdated Show resolved Hide resolved

src/simulations/collect.jl Outdated Show resolved Hide resolved

src/simulations/collect.jl Outdated Show resolved Hide resolved

src/simulations/collect.jl Outdated Show resolved Hide resolved

Datseris requested changes Jun 4, 2023

View reviewed changes

src/simulations/collect.jl Outdated Show resolved Hide resolved

Datseris mentioned this pull request Jun 7, 2023

Allow writing data during run! #630

Closed

mastrof added 2 commits June 10, 2023 13:22

fix docstrings and use empty! instead of reinitializing dataframes

4c30cd0

remove duplicate CSV from project extras

3c42bba

Datseris reviewed Jun 10, 2023

View reviewed changes

src/simulations/collect.jl Outdated Show resolved Hide resolved

Datseris reviewed Jun 10, 2023

View reviewed changes

src/simulations/collect.jl Outdated Show resolved Hide resolved

Datseris approved these changes Jun 10, 2023

View reviewed changes

mastrof added 2 commits June 11, 2023 17:08

use semicolon for kwargs and fix docstring

15a6584

mention offline_run! in the run! docstring

c21d2f0

mastrof added 3 commits June 12, 2023 08:50

select writing backend via kwarg and update test

03487ac

update project and changelog

9207cc0

add offline_run! to docs

088a956

Datseris reviewed Jun 13, 2023

View reviewed changes

src/simulations/collect.jl Outdated Show resolved Hide resolved

mastrof added 2 commits June 15, 2023 21:26

introduce function barrier

dd0156c

Merge branch 'main' of https://github.com/JuliaDynamics/Agents.jl int…

e5a6457

…o offline_run

Update CHANGELOG

86788bb

Datseris reviewed Jun 21, 2023

View reviewed changes

src/simulations/collect.jl Outdated Show resolved Hide resolved

Project.toml Outdated Show resolved Hide resolved

fbanning and others added 12 commits June 21, 2023 17:38

Reintroduce function barrier

7336c91

Move extensions into separate folders

800b5c8

Add AgentsArrow extension

b4f4a62

Remove AgentsIO for Arrow for now

7992063

Turn Arrow into weak dep for ext

875fb8c

Create stub function for extension

fdc63fb

Directly use Arrow in tests

2930f57

Add Arrow as test dependency

216f48a

Merge AgentsIO placeholder funcs into AgentsArrow

3971829

Add placeholders for AgentsIO with Arrow

0f308cd

maybe

c137a7e

Merge branch 'main' into offline_run

80f9adc

fbanning requested changes Jul 10, 2023

View reviewed changes

revert last change

28c75b0

Datseris reviewed Jul 10, 2023

View reviewed changes

docs/src/api.md Show resolved Hide resolved

fbanning approved these changes Jul 10, 2023

View reviewed changes

fbanning mentioned this pull request Jul 10, 2023

Tests for offline_run! on Windows can't be perfomed due to locking problems #826

Open

Handle Arrow.jl integration on Windows

b5a25ee

Tortar approved these changes Jul 10, 2023

View reviewed changes

Datseris merged commit 6902229 into JuliaDynamics:main Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New function `offline_run!` to write data during `run!` #815

New function `offline_run!` to write data during `run!` #815

mastrof commented Jun 3, 2023

codecov-commenter commented Jun 3, 2023 •

edited

Loading

Datseris left a comment

mastrof commented Jun 4, 2023

Datseris commented Jun 4, 2023

Datseris commented Jun 4, 2023

mastrof commented Jun 10, 2023

Datseris left a comment

mastrof commented Jun 11, 2023

Datseris commented Jun 11, 2023

Datseris left a comment

fbanning commented Jun 21, 2023

fbanning commented Jun 21, 2023

fbanning left a comment

Tortar commented Jul 10, 2023 •

edited

Loading

fbanning commented Jul 10, 2023

Tortar commented Jul 10, 2023 •

edited

Loading

fbanning commented Jul 10, 2023

Datseris commented Jul 10, 2023

fbanning commented Jul 10, 2023

New function offline_run! to write data during run! #815

New function offline_run! to write data during run! #815

Conversation

mastrof commented Jun 3, 2023

codecov-commenter commented Jun 3, 2023 • edited Loading

Codecov Report

Datseris left a comment

Choose a reason for hiding this comment

mastrof commented Jun 4, 2023

Datseris commented Jun 4, 2023

Datseris commented Jun 4, 2023

mastrof commented Jun 10, 2023

Datseris left a comment

Choose a reason for hiding this comment

mastrof commented Jun 11, 2023

Datseris commented Jun 11, 2023

Datseris left a comment

Choose a reason for hiding this comment

fbanning commented Jun 21, 2023

fbanning commented Jun 21, 2023

fbanning left a comment

Choose a reason for hiding this comment

Tortar commented Jul 10, 2023 • edited Loading

fbanning commented Jul 10, 2023

Tortar commented Jul 10, 2023 • edited Loading

fbanning commented Jul 10, 2023

Datseris commented Jul 10, 2023

fbanning commented Jul 10, 2023

New function `offline_run!` to write data during `run!` #815

New function `offline_run!` to write data during `run!` #815

codecov-commenter commented Jun 3, 2023 •

edited

Loading

Tortar commented Jul 10, 2023 •

edited

Loading

Tortar commented Jul 10, 2023 •

edited

Loading